Mining TCGA Data Using Boolean Implications
نویسندگان
چکیده
Boolean implications (if-then rules) provide a conceptually simple, uniform and highly scalable way to find associations between pairs of random variables. In this paper, we propose to use Boolean implications to find relationships between variables of different data types (mutation, copy number alteration, DNA methylation and gene expression) from the glioblastoma (GBM) and ovarian serous cystadenoma (OV) data sets from The Cancer Genome Atlas (TCGA). We find hundreds of thousands of Boolean implications from these data sets. A direct comparison of the relationships found by Boolean implications and those found by commonly used methods for mining associations show that existing methods would miss relationships found by Boolean implications. Furthermore, many relationships exposed by Boolean implications reflect important aspects of cancer biology. Examples of our findings include cis relationships between copy number alteration, DNA methylation and expression of genes, a new hierarchy of mutations and recurrent copy number alterations, loss-of-heterozygosity of well-known tumor suppressors, and the hypermethylation phenotype associated with IDH1 mutations in GBM. The Boolean implication results used in the paper can be accessed at http://crookneck.stanford.edu/microarray/TCGANetworks/.
منابع مشابه
Mining Large Heterogeneous Cancer Data Sets Using Boolean Implications
Boolean implications (if-then rules) provide a conceptually simple, uniform and highly scalable way to find associations between pairs of random variables. In this paper, we describe their usage in mining associations from large, heterogeneous cancer data sets. Next, we illustrate how Boolean implications were used to discover a new causal association between a mutation and aberrant DNA hyperme...
متن کاملSome properties of evaluated implications used in knowledge-based systems and data-mining
The core of expert knowledge is typically represented by a set of rules (implications) assigned with weights specifying their (un)certainties. The task of inference mechanism in such rulebased expert systems can be analyzed from the many-valued (fuzzy) logic perspective. On the other hand, implicational relations between two Boolean attributes derived from data (association rules) are quantifie...
متن کاملA Theoretical Framework for Association Mining Based on the Boolean Retrieval Model
Data mining has been defined as the nontrivial extraction of implicit, previously unknown and potentially useful information from data. Association mining is one of the important sub-fields in data mining, where rules that imply certain association relationships among a set of items in a transaction database are discovered. The efforts of most researchers focus on discovering rules in the form ...
متن کاملA Novel Boolean Algebraic Framework for Association and Pattern Mining
Data mining has been defined as the nontrivial extraction of implicit, previously unknown and potentially useful information from data. Association mining and sequential mining analysis are considered as crucial components of strategic control over a broad variety of disciplines in business, science and engineering. Association mining is one of the important sub-fields in data mining, where rul...
متن کاملUsing Efficient Boolean Algorithms for Mining Association Rules
In this paper, we use transaction data as the source data of mining, and each transaction data contains a consumer ever buy items. We mine association rules from two aspects. One is to present a Boolean FP-tree algorithm to mine association rules with the Boolean computation according to the FP-tree algorithm and CDAR algorithm. The experiments show that the performances of our algorithm are fa...
متن کامل